feat(plugins-soniox): surface per-run language segments by rosetta-livekit-bot[bot] · Pull Request #1602 · livekit/agents-js

rosetta-livekit-bot · 2026-05-25T15:33:50Z

Summary

Fixes #5685 (and the follow-up source-side symptom raised in the comment thread, which @chenghao-mou approved bundling into the same PR).

Both halves are the same plugin bug: _TokenAccumulator._lang_segments is built per-run by the existing coalescing logic but then dropped in send_endpoint_transcript (and the interim path). The fix surfaces it through new SpeechData fields on the target side, and stops dropping it on the source side in non-translation mode.

Changes

stt.SpeechData: add target_languages / target_texts (symmetric to existing source_languages / source_texts). Same LanguageCode coercion in __post_init__. Default None, so the addition is strictly additive for every other plugin.
Soniox plugin, translation mode: populate target_* from final._lang_segments on FINAL_TRANSCRIPT and INTERIM_TRANSCRIPT / PREFLIGHT_TRANSCRIPT. Consumers now see the per-run target breakdown for code-switched two-way translation, e.g. target_languages=["en", "es"] / target_texts=["Hello, how are you?", " Estoy bien, gracias."] for the translation of "Hello, ¿cómo estás? I'm doing fine, gracias.".
Soniox plugin, non-translation mode: populate source_* from the same accumulator (previously None). A code-switched ja + en utterance now surfaces source_languages=["ja", "en"] / source_texts=["こんにちは、私の名はサムです。", " My name is Sam."] -- matches what the SpeechData docstring already promised for "multi-language detection services".
Refactor: extract a _lang_segments_to_fields helper to DRY the conversion across both modes and both event paths; the four duplicated inline list comprehensions collapse to one named operation. The predicate that distinguishes source from target became data-presence-based (final_original._lang_segments) rather than config-based (is_translation_mode is not None), which is what unified both halves cleanly.

SpeechData.text and SpeechData.language are unchanged for back-compat (still the full concatenation and the first translated/detected language, respectively).

Test plan

14 new unit tests in tests/test_plugin_soniox_stt.py covering:
- SpeechData.__post_init__ target_languages coercion (strings → LanguageCode, None stays None, existing LanguageCode passthrough)
- _TokenAccumulator._lang_segments per-run coalescing
- _lang_segments_to_fields helper edge cases (empty → (None, None), non-empty → parallel lists with LanguageCode coercion)
- Two-way translation, code-switched (the issue's canonical example)
- One-way translation (single target run)
- "none" untranslated chunk inside a translated utterance (asymmetric per-run list lengths)
- Interim path: translation mode merging final + non-final per run on both sides
- Interim path: non-translation mode populates source_* from final + non-final merged
- Non-translation single-language: source_* populated, target_* None
- Non-translation code-switched JA+EN: source_* carries the per-run breakdown
Live-verified end-to-end against the real Soniox WebSocket API in console mode:
- Translation mode, code-switched "Hello, ¿cómo estás? I'm doing fine, gracias." → text="Hello, how are you? Estoy bien, gracias.", target_languages=["en", "es"], target_texts=["Hello, how are you?", " Estoy bien, gracias."], "".join(target_texts) == text. Source side unchanged.
- Non-translation mode, code-switched " こんにちは、私の名はサムです。 My name is Sam." → text=" こんにちは、私の名はサムです。 My name is Sam.", source_languages=["ja", "en"], source_texts=[" こんにちは、私の名はサムです。", " My name is Sam."], target_* correctly None. Interim events also surface the multi-language source breakdown progressively as the user code-switches.
ruff format clean, ruff check clean, no new mypy --strict errors introduced in changed files.

Follow-ups (intentionally not in this PR)

The final / final_original accumulator names are honest about routing today but the new target_* fields make their two-mode roles more glaring (final is "primary user-facing accumulator", final_original is "source-side accumulator that's empty in non-translation mode"). Worth a separate behavior-preserving rename PR to final_primary / final_source.
The new target_* fields are wired in Soniox only; other translation-capable plugins (Gladia, Deepgram v2, AWS) can adopt them in follow-up PRs.

changeset-bot · 2026-05-25T15:33:55Z

🦋 Changeset detected

Latest commit: 836f146

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 34 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-soniox	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-fishaudio	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-hume	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-perplexity	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugin-tavus	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Co-authored-by: rosetta-livekit-bot[bot] <282703043+rosetta-livekit-bot[bot]@users.noreply.github.com>

devin-ai-integration

Devin Review found 2 new potential issues.

View 9 additional findings in Devin Review.

devin-ai-integration · 2026-05-28T21:06:40Z

+      if (data === SpeechStream.FLUSH_SENTINEL) {
+        continue;
+      }
+      ws.send(data.data.buffer);


🔴 Sending data.data.buffer may transmit incorrect bytes when AudioFrame's typed array is a view into a larger ArrayBuffer

In #sendAudio, ws.send(data.data.buffer) sends the entire underlying ArrayBuffer of the Int16Array. If the AudioFrame.data typed array is a view with a non-zero byteOffset or doesn't span the full buffer (e.g., after resampling or slicing), this sends more/wrong bytes than intended. Other plugins (e.g., Deepgram, ElevenLabs) typically send the typed array directly or use Buffer.from(data.buffer, data.byteOffset, data.byteLength) to handle this correctly.

Affected code in stt.ts

ws.send(data.data.buffer) should be ws.send(Buffer.from(data.data.buffer, data.data.byteOffset, data.data.byteLength)) or simply ws.send(data.data) which the ws library handles correctly for typed arrays.

Suggested change

ws.send(data.data.buffer);

ws.send(Buffer.from(data.data.buffer, data.data.byteOffset, data.data.byteLength));

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-05-28T21:06:42Z

+    });
+
+    try {
+      await Promise.race([sendTask, listenTask, waitForAbort(this.abortSignal)]);


🔴 Using Promise.race instead of waiting for both send and listen tasks causes premature WebSocket closure and lost final transcripts

In #runWS, await Promise.race([sendTask, listenTask, waitForAbort(this.abortSignal)]) means that when the audio input ends and sendTask resolves, the code immediately enters the finally block and closes the WebSocket — without waiting for the server to send its final transcription and finished message. Unlike Deepgram which uses Promise.all([sendTask(), listenTask.result, ...]) to wait for both sides to complete, this plugin closes the connection before receiving the server's final response. While message handlers are technically still attached during the WebSocket closing handshake, this relies on fragile timing behavior and the server being fast enough to flush before the close completes.

Prompt for agents

In plugins/soniox/src/stt.ts in the #runWS method, the Promise.race on line 262 causes the WebSocket to close as soon as sendTask resolves (audio input ends), without waiting for the server to send back its final transcription and 'finished' message. The fix should restructure this so that after sendTask completes, the code waits for listenTask to resolve (i.e., the server sends finished or error), while still respecting the abort signal. A common pattern (used by the Deepgram plugin) is to use Promise.all for sendTask+listenTask and then race that against the abort signal. Something like: await Promise.race([Promise.all([sendTask, listenTask]), waitForAbort(this.abortSignal)]). This ensures the server has time to flush remaining transcriptions after audio input ends.

Was this helpful? React with 👍 or 👎 to provide feedback.

feat(plugins-soniox): surface per-run language segments

304ffe6

This comment was marked as resolved.

Sign in to view

chenghao-mou and others added 3 commits May 25, 2026 20:12

refactoring

8084edb

address comment

96ea001

fix(soniox): surface STT server errors (#1626)

836f146

Co-authored-by: rosetta-livekit-bot[bot] <282703043+rosetta-livekit-bot[bot]@users.noreply.github.com>

devin-ai-integration Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugins-soniox): surface per-run language segments#1602

feat(plugins-soniox): surface per-run language segments#1602
rosetta-livekit-bot[bot] wants to merge 4 commits into
mainfrom
manuring-hurling-eloped

rosetta-livekit-bot Bot commented May 25, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 25, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 28, 2026

Uh oh!

devin-ai-integration Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	ws.send(data.data.buffer);
	ws.send(Buffer.from(data.data.buffer, data.data.byteOffset, data.data.byteLength));

Conversation

rosetta-livekit-bot Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Follow-ups (intentionally not in this PR)

Uh oh!

changeset-bot Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rosetta-livekit-bot Bot commented May 25, 2026 •

edited

Loading

changeset-bot Bot commented May 25, 2026 •

edited

Loading